Learning and Applying Competitive Strategies
نویسندگان
چکیده
Learning reusable sequences can support the development of expertise in many domains, either by improving decisionmaking quality or decreasing execution speed. This paper introduces and evaluates a method to learn action sequences for generalized states from prior problem experience. From experienced sequences, the method induces the context that underlies a sequence of actions. Empirical results indicate that the sequences and contexts learned for a class of problems are actually those deemed important by experts for that particular class, and can be used to select appropriate action sequences when solving problems there. Repeated problem solving can provide salient, reusable data to a learner. This paper focuses on programs that acquire expertise in a particular domain. The thesis of our work is that previously experienced sequences are an important knowledge source for such programs, and that appropriate organization of those sequences can support reuse of that knowledge, in novel as well as familiar situations (Lock 2003). The primary contributions of this paper are a learning method for the acquisition of action sequences (contiguous subsequences of decisions) and an empirical demonstration of their efficacy and salience as competitive strategies in game playing. We claim not that sequences are all there is to learn, but that they are a powerful part of expert behavior, and well worth learning. In the work reported here, our method learns many useful action sequences and highlights the ones human experts consider significant. The context of an action sequence is the set of world states where it is relevant. Rather than deduce all possible contexts from the operators available to an agent, or from inspection of the search space, our method exploits the knowledge inherent in experience to learn context. It associates a sequence of actions with the set of origin states where the sequence’s execution began. The underlying assumption is that some commonality among origin states drives the same sequence of moves. Discovering context is thus reduced to generalization over the origin states to extract it. A learned context is paired with its sequence, and subsequently recommended as a course of action when the context matches the current state of the world. Macro operators to speed problem solving and planning have been widely studied. MACROPS learned to compose a sequence of operators together, but it was limited to domains with a set of operators with well-defined preconditions and post-conditions, ones where the goal state was a conjunction of well-defined subgoals (Fikes, Hart et al. 1972). Other programs learn to acquire macros that achieve subgoals without undoing previously solved subgoals (Korf 1985; Tadapelli and Natarajan 1996). A game, however, may lack accessible subgoals, and actions in games can have subtle, far-reaching effects that are not clearly delineated. MACLEARN and MICRO-HILLARY use a domain’s heuristic function to selectively learn macros from problem solving experience (Iba 1989; Finkelstein and Markovitch 1998). Our method assumes neither the existence of subgoals nor of a heuristic function. Planning for Markov decision processes learns macro actions to achieve subgoals (Moore, Baird et al. 1998; Sutton, Precup et al. 1999; Precup 2000). Most closely related to our method is reinforcement learning work that discovers useful sequences online from successful experience (McGovern 2002). Neither of McGovern’s methods generalize context, however. Our approach uses problem-solving experience to learn the context for a sequence, and uses that knowledge to make decisions. The next section describes how this method learns action sequences and their contexts, and applies them. Subsequent sections describe a program that applies learned competitive strategies, details our experimental design, and discuss results and related work. 1. Learning action sequences with contexts We explore sequence learning first within the domain of two-player, perfect-information, finite board games, using the following definitions. (Broader applicability is addressed in Section 3.) A problem-solving experience is a contest, the agents contestants, and a legal move an action. A state is uniquely described by the game board and whose turn it is to move. A place on a game board where a contestant’s piece may be placed is a location. All locations on a game board are numbered in row order, as in Figure 1. A blank is an empty location. A pattern is a set of locations and their contents. Our method learns action sequences only from successful experience; in game playing, those are move sequences from contests in which a particular contestant won or drew at a draw game, where the value of the root of the game tree is a draw under full minimax. We consider two games here: lose-tic-tac-toe and five men’s morris. Lose-tic-tac-toe, where the first to place three pieces in a row in Figure 1(a) loses, is far more difficult for people to learn than ordinary tic-tac-toe. The object is to avoid a certain pattern rather than to achieve it, and a non-optimal move is a fatal error. Flawless lose tic-tac-toe play involves different strategies for each contestant, some of which are non-intuitive and go undiscovered by many people (Cohen 1972). In five men’s morris, the contestants (black and white) each have five pieces, which they take turns placing on the board in Figure 1(b). Once the pieces are all placed, a move slides a piece from one location on the board to an adjacent, empty location. A contestant who achieves a mill (three owned pieces on a line drawn on the board) removes any one opposition piece from the board. A contestant reduced to two pieces or unable to slide, loses. This is a challenging game, with a significantly larger game graph than lose tic-tac-toe. 1.1 Overview of the learning method The program SeqLearner observes computer programs of various skill levels in competition at a game. After each contest, SeqLearner examines the contest’s actions for sequences of prespecified lengths. It collects action sequences that are duets (include both agents’ actions) and solos (include one agent’s actions.) In some two-agent domains, an agent determinedly executes a sequence of actions, so long as they are legal, no matter what the other agent does. If one gathers only sequences that include both agents’ actions, the other agent’s varied responses to a solo could cast instances of the same solo as different sequences. Therefore, SeqLearner also gathers solos. In the trace of a contest of length n, the number of duets of any length is O(n), as is the number of solos. To associate these sequences with states, SeqLearner organizes the collected data into SeqTable, a sequence hash table. A key for SeqTable is a sequence; a value stored there is a list of states. For each extracted sequence, SeqLearner records in SeqTable the sequence’s origin state, where the sequence’s execution began. For repeatedly encountered sequences, SeqLearner extracts a context from the set of origins associated with it. Periodically, SeqLearner sweeps through SeqTable, examining one sequence at a time. For game playing, SeqLearner identifies the maximal common pattern that exists in all the stored states as the context for the sequence. SeqLearner takes each location of the board as a feature, and examines its value in each of those states, retaining only features with the same value on all the boards. Any location whose value is not retained is labeled # (don’t care). The resultant maximal common pattern is associated with its sequence in a context pair. Both contexts and sequences are normalized for horizontal, vertical, and diagonal reflection, plus rotations of 90°, 180°, and 270°. As a result, a sequence is retrievable whenever its context or a symmetric equivalent arises. With additional experience, the candidate sets for each sequence are likely to grow, engendering new, more general context pairs. If more than one sequence leads to the formation of the same context, those sequences are merged into a sequence tree (a tree of alternative, overlapping sequences) to form a single context pair. For example if the three sequences , , and have the identical context, they are merged into the competitive strategy in Figure 2. Given their inductive origin, these context pairs are not applied immediately in problem solving; the incorrect context pairs must first be carefully filtered out. A variant of PWL (Probabilistic Weight Learning) is used to weight them (Epstein 1994). The algorithm determines how well each context pair simulates expert behavior. After every experience, it takes the states where it was the expert’s turn to act, checks whether a context pair’s advice supports or opposes the expert’s decision, and revises the pair’s weight in [0,1] accordingly. A context pair’s weight rises consistently when it is reliable and frequently applicable.
منابع مشابه
Outsourcing or Insourcing of Transportation System Evaluation Using Intelligent Agents Approach
Nowadays, outsourcing is viewed as a trade strategy and organizations tend to adopt new strategies to achieve competitive advantages in the current world of business. focusing on main copmpetencies, and transferring most of activities to outside resources of organization( outsourcing) is one such strategy is. In this paper, we aim to decide on decision maker agent of transportation system, by a...
متن کاملEFL Learners’ Deployment of Motivational Self-Regulatory Strategies and their Academic Achievement
Self-regulation of learning has been extensively investigated in second language (L2) learning. Many studies have focused on the strategies that language learners employ to regulate their own learning processes. However, motivational self-regulation is considerably less explored. The aim of this study was to investigate the relationship between motivational self-regulatory strategies (MSRSs) an...
متن کاملInvolvement Load of Vocabulary Tasks IELTS preparation Vocabulary Course Books
The importance of vocabulary is undeniable. EFL learners need sufficient lexicon in order to bea competitive speaker. Lots of strategies have been proposed. The concept of involvement loadwas first introduced by Hulstijn and Laufer (2001). They believed that deeper explanation oflexical information will result in better retention of them. The present study aimed at finding the...
متن کاملThe Effect of a Learning and Study Skills Workshop on Talented Students’ Learning and Study Strategies in Isfahan University of Medical Sciences
Introduction: The significance of learning strategies used by students in their academic achievement has been argued. In this study, the effect of a study skills educational workshop on employing learning and study strategies by talented students of Isfahan University of Medical Sciences has been investigated. Methods: This quasi-experimental study was performed on 40 talented students of Isfa...
متن کاملراهبردهای آموزشی برای بهبود اختلال یادگیری املا
One of the most common learning disabilities among students is dyslexic disorder, which accounts for one third of all learning disabilities. Therefore, the present study aimed to introduce educational strategies to improve spelling learning disorder in students. Teachers' familiarity with such strategies can help them to overcome obstacles, problems, and difficulties in realizing the educationa...
متن کاملThe Effects of Cooperative, Competitive, and Individual Learning on Students\' Physical Readiness
The Effects of Cooperative, Competitive, and Individual Learning on Students' Physical Readiness A. Shams, Ph.D. B. Abdoli, Ph.D. P. ShamsipoorDehkordi, Ph.D. To compare the effectiveness of three learning styles in physical education, a sample of 120 male sixth graders was randomly selected using cluster sampling method and then randomly assigned to three experimenta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004